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[57] ABSTRACT 

A pipelined multiple issue architecture for a link layer or 
protocol layer packet switch, which processes packets inde- 
pendently and asynchronously, but reorders them into their 
original order, thus preserving the original incoming packet 
order. Each stage of the pipeline waits for the immediately 
previous stage to complete, thus causing the packet switch 
to be self-throttling and thus allowing differing protocols 
and features to use the same architecture, even if possibly 
requiring differing processing times. The multiple issue 
pipeline is scaleable to greater parallel issue of packets, and 
tunable to differing switch engine architectures, differing 
interface speeds and widths, and differing clock rates and 
buffer sizes. The packet switch comprises a fetch stage, 
which fetches the packet header into one of a plurality of 
fetch caches, a switching stage comprising a plurality of 
switch engines, each of which independently and asychro- 
nously reads from corresponding fetch caches, makes 
switching decisions, and write to a reorder memory, a 
reorder engine which reads from the reorder memory in the 
packets' original order, and a post-processing stage, com- 
prising a post-process queue and a post-process engine, 
which performs protocol-specific post-processing on the 
packets. 

40 Claims, 5 Drawing Sheets 
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PIPELINED MULTIPLE ISSUE PACKET In a preferred embodiment, the packet switch comprises 

SWITCH a fetch stage, which fetches the packet header into one of a 

HAr^imnwn nc tuc imwcktt™m plurality of fetch caches, a switching stage comprising a 

BACKGROUND OF THE INVENTION phlrality of engines> each of which independently ^ 

1. Field of the Invention 5 asychronously reads from corresponding fetch caches, 
This invention relates to a pipelined multiple issue packet makes switching decisions, and writes to a reorder memory, 

switch. a reorder engine which reads from the reorder memory in the 

2. Description of Related Art packets' original order, and a post-processing stage, com- 
When computers are coupled together into networks for P*™8 a post-process queue and a post-process engine, 

communication, it is known to couple networks together and 10 wm ' h P erforms P^ocol-specific post-processing on the 

to provide a switching device which is coupled to more than pacKeis. 

one network. The switching device receives packets from BRIEF DESCRIPTION OF THE DRAWINGS 

one network and retransmits those packets (possibly m 

another format) on another network. In general, it is desir- FIG. 1 shows the placement of a packet switch in an 

able for the switching device to operate as quickly as 15 internetwork. 

possible. p IG 2 shows a block diagram of a packet switch. FIG. 2 

However, there are several constraints under which the comprises FIG. 2A and FIG. 2B collectively. 

switching device must operate. First, packets may encapsu- FIG. 3 shows a fetch stage for the packet switch. 

late differing protocols, and thus may differ significantly in rtn A , . ~ , . 

. 4 . , . . o j u •* i_* 20 FIG- 4 shows a block diagram of a system having a 

length and in processmg time. Second, when switching , r . c , . .. u . ^ « , 7 b 

T . - t ii lL 11 . j plurality ot packet switches in parallel. 

packets from one network to another, it is generally required r 3 r r 

that packets are re-transmitted in the same order as they DESCRIPTION OF THE PREFERRED 

arrive. Because of these two constraints, known switching EMBODIMENT 

device architectures are not able to take advantage of sig- ^ 

nificant parallelism in switching packets. In the following description, a preferred embodiment of 

It is also desirable to account ahead of time for future the invention is described with regard to preferred process 

improvements in processing hardware, such as bandwidth steps, data structures, and switching techniques. However, 

and speed of a network interface, clock speed of a switching those skilled in the art would recognize, after perusal of this 

processor, and memory size of a packet buffer, so that the 30 application, that embodiments of the invention may be 

design of the switching device is flexible and scaleable with implemented using a set of general purpose computers 

such improvements. operating under program control, and that modification of a 

The following U.S. Patents may be pertinent: set of general purpose computers to implement the process 

U.S. Pat. No. 4,446,555 to Devault et al, "Time Division ^P 8 and data structures described herein would not require 

Multiplex Switching Network For Multiservice Digital 35 yxD&a& mventlon - 

Networks"; The present invention may be used in conjunction with 

U.S. Pat. No. 5,212,686 to Joy et al, "Asynchronous Time technology disclosed in the following copending applica- 

Division Switching Arrangement and A Method of tlon - 

Operating Same"; Application Sen No. 08/229,289, filed Apr. 18, 1994, in 

U.S. Pat. No. 5,271,004 to Proctor et al., "Asynchronous 40 the name of inventors Bruce A. Wilford, Bruce Sherry, 

Transfer Mode Switching Arrangement Providing David Tsiang, and Anthony li, titled "Packet Switching 

Broadcast Transmission"; and Engine" now U.S. Pat. No. 5,509,006. 

U.S. Pat, No. 5,307,343 to Bostica et al., "Basic Element This application is hereby incorporated by reference as if 

for the Connection Network of A Fast Packet Switching fully set forth herein, and is referred to herein as the "Packet 

Node". 45 Switching Engine disclosure". 
Accordingly, it would be advantageous to provide an 

improved architecture for a packet switch, which can make PIPELINED, MULTIPLE ISSUE PACKET 

packet switching decisions responsive to link layer (ISO SWITCH 

level 2) or protocol layer (ISO level 3) header information, FIG. 1 shows the placement of a packet switch in a Q 

which is capable of high speed operation at relatively low 3U ^^^^0^ 

cost, and which is flexible and scaleable with future * . . 

improvements in processing hardware. Apacket switch 100 is coupled to a first network interface 

101 coupled to a first network 102 and a second network 

SUMMARY OF THE INVENTION interface 101 coupled to a second network 102. When a 

The invention provides a pipelined multiple issue link 55 packet 103 is recognized by the first network interface 101 

layer or protocol layer packet switch, which processes (i.e., the MAC address of the packet 103 is addressed to the 

packets independently and asynchronously, but reorders packet switch 100 or to an address known to be off the first 

them into their original order, thus preserving the original network 102), the packet 103 is stored in a packet memory 

incoming packet order. Each stage of the pipeline waits for 110 and a pointer to a packet header 104 for the packet 103 

the immediately previous stage to complete, thus causing the 60 is generated. 

packet switch to be self- throttling and thus allowing differ- In a preferred embodiment, the packet header 104 com- 
ing protocols and features to use the same architecture, even prises a link layer (level 2) header, and a protocol layer 
if possibly requiring differing processing times. The multiple (level 3) header. The link layer header, sometimes called a 
issue pipeline is scaleable to greater parallel issue of packets, "MAC (media access control) header, comprises informa- 
and tunable to differing switch engine architectures, differ- 65 tion for communicating the packet 103 on a network 102 
ing interface speeds and widths, and differing clock rates and using particular media, such as the first network 102. The 
buffer sizes. protocol layer header comprises information for level 3 
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switching of the packet 103 among networks 102. The link Each fetch cache 212 is double buffered, so that the fetch 

layer header comprises information for level 2 switching engine 211 may write a new packet header 104 to a fetch 

(i.e., bridging). For example, the link layer header may cache 212 while the corresponding switch engine 221 is 

comprise an ethernet, FDDI, or token ring header, while the reading from the fetch cache 212. This is in addition to the 

protocol layer header may comprise an IP header. Also, there 5 f ctcn on demand operation described above, in which the 

are hybrid switching techniques which respond to the both fetch cngine 2 U writing successive blocks of additional 

the level 2 and the level 3 headers, as well as those which byte& of ^ incomplete p ac ket header 104 in response to a 

respond to level 4 headers (such as extended access lists). si j from a switch mc 221 ^ each ^cui* fetch 

Those skilled in the art will recognize after perusal of this cache 2U ; ^ tQ two kct headcrs m wheQ 

applica ion, that other types of packet headeis or raders are are fctch ^ arc 2N cke( 

within the scope and spirit of the invention, and that adapt- , , <Ai . t . , . i( c \t , 

ing the invention to switching such packet headers would headers 104 F"I«l™d * the fetch stage 210. 

not involve invention or undue experimentation. More generally, there may be N fetch caches 212, each of 

The packet switch 100 reads the packet header 104 and which comprises B buffers, for a total of BN buffers. The 

performs two tasks— (1) it rewrites the packet header 104, if fetch engine 211 writes new packet headers 104 in sequence 

necessary, to conform to protocol rules for switching the 15 to the N fetch caches 212 in order, and when the fetch engine 

packet 103, and (2) it queues the packet 103 for transmission 211 returns to a fetch cache 212 after writing in sequence to 

on an output network interface 101 and thus an output all other fetch caches 212, it writes in sequence to the next 

network 102. For example, if the output network 102 one of the B buffers within that fetch cache 212. 

requires a new link layer header, the packet switch 100 As shown below, the switching stage 220 comprises an 

rewrites the link layer header. If the protocol layer header 20 identical number N of switch engines 221, each of which 

comprises a count of the number of times the packet 103 has reads in sequence from one of the B buffers of its designated 

been switched, the packet switch 100 increments or decre- fetch cache 212, returning to read from a buffer after reading 

ments that count, as appropriate, in the protocol layer header. in sequence from all other buffers in that fetch cache 212. 

FIG. 2 shows a block diagram of a packet switch. FIG. 2 i n FIG. 2A, a preferred embodiment in which there are 

comprises FIG. 2Aand FIG. 2B collectively. ^0 f etch caches 2 12, there are four packet headers 104 

The packet switch 100 comprises a fetch stage 210, a pipelined in the fetch stage 210, labeled n+3, n+2, n+1, and 

switching stage 220, and a post-processing stage 230. n. In FIG. 2B, an alternative preferred embodiment in which 

The pointer to the packet header 104 is coupled to the there are four fetch caches 212, there are eight packet 

fetch stage 210. The fetch stage 210 comprises a fetch 3Q headers 104 pipelined in the fetch stage 210, labeled n+7, 

engine 211 and a plurality of (preferably two) fetch caches n+6, n+5, n+4, n+3, n+2, n+1, and n. 

212. Each fetch cache 212 comprises a double buffered The fetch stage 210 is further described with regard to 

FIFO queue. FIG. 3. 

FIG. 2Ashows a preferred embodiment in which there are ^ switching stage 2 20 comprises a plurality of switch 

two fetch caches 212, while FIG. 2B shows an alternative 35 engines 221> one for each fetch cache 212> a reorder/ 

preferred embodiment in which there are four fetch caches rewrite engine 222 
212 

. , ... „ A , Each switch engine 221 is coupled to a corresponding 

In response to a sjgnal from the switching stage 220, the fctch ^ 2U Each switcb m independently and 

fetch engine 211 prefetches a block of M bytes of the packet KS y timamay reads from ils corresponding fetch cache 212, 

header 104 and stores that block in a selected FIFO queue of 40 makcs a switching decision> and writes its results to one of 

a selected fetch cache 212 In a preferred embodiment the a luralit of (p re f erab i y ^ reorder/rewrite memories223 

value of M the size of the block, is independent of the ^ ^ en ^ e 222 . ^ ^ mere aK N 

protocol embodied in the protocol link layer, and b prefer- fctch 2U ^ are ako N swilch ^ m amJ 

ably about 64 bytes. In alternative embodiments, the value when ^ afe K reordcr/rewritc memorics 223 for each 

^uT„ y n 3 ? ' e ; S 'i y - s 0 ^™: 80 °f ^ « switch engine 221, there are KN reorder/rewrite memories 

switch 100 operates most efficiently with a selected mix of 223 in N sets of K 

packets 103 it is expected to switch. __ „ A , , 

jt « 1 * c xm u - j . • 1 j .u FIG. 2Ashows a preferred embodiment in which there are 

When the block of M bytes does not include the entire . ... • , , 

1 , 1 , , n . c . c . „,,.. . two switch engines 221 and four reorder/rewrite memories 

packet header 104, the fetch engine 211 fetches, in response ,,, ... ,_ , „ , . . .. 

to a signal from the fetch cache 212, a successive block of 50 ^ whlle ™ ?; 2B show * ™ alternative preferred embodi- 

L additional bytes of the packet header 104 and stores those men ' m fr whlch there are to*™** en g mes 221 and "S ht 

ui 1 • *u 1 * -nrrt f*u i * j r * i_ u re order/rewrite memories 223. 
blocks in the selected FIFO queue ot the selected fetch cache 

212, thus increasing the amount of data presented to the In a preferred embodiment, each switch engine 221 corn- 
switching stage 220. In a preferred embodiment, the value of P rises a Packet switch engine as shown in the Packet 
L, the size of the additional blocks, is equal to the byte width 55 Switchin g Engine disclosure. The switching results and 
of an interface to the packet memory 110, and in a preferred other data ( e -S- statistical information) written into the 
embodiment is about 8 bytes, reorder/rewrite memories 223 comprise information regard- 
After storing at least a portion of a packet header 104 in ^ h ° w to ^ P acket * "^J* 
a fetch cache 212, the fetch engine 211 reads the next packet mterfacc 101 1 ? P^et 103 Preferably, 
header 104 and proceeds to read that packet header 104 and 60 ^nation comprises results registers as described in 
store it in a next selected fetch cache 212. The fetch caches the Packct Switching Engine disclosure, and includes a 
212 are selected for storage in a round-robin manner. Urns P omtcr to mc P acket header 104 in thc P acket memor y m 
when there are N fetch caches 212, each particular fetch Preferably, a single integrated circuit chip comprises 
cache 212 receives every Nth packet header 104 for storage; significant circuits of at least one, and preferably more than 
when there are two fetch caches 212, each particular fetch 65 one > switch engine 221. 

cache 212 receives every other packet header 104 for As described in the Packet Switching Engine disclosure, 

storage. each switch engine 221 reads instructions from a "tree 
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memory" comprising instructions for reading and interpret- 211, there are 3S+2 packet headers 104 processed in parallel 

ing successive bytes of the packet header 104. In a preferred by the packet swatch 100. Of these, 2S packet headers 104 

embodiment, the tree memory comprises a set of memory are stored in the fetch stage 210, S packet headers 104 are 

registers coupled to the switch engine 221. In an alternative stored in the reorder/rewrite memories 223, and 2 packet 

embodiment, at least some of the tree memory may be 5 headers 104 are stored in the post-processing stage 230. 
cached on the integrated circuit chip for the switch engine in a preferred embodiment, the packet memory 110 is 

221. clocked at about 50 MHz and has a memory fetch path to the 

The reorder/rewrite engine 222 reads from the reorder/ fetch stage 210 which is eight bytes wide, there are two 

rewrite memories 223 in a preselected order. The N sets of switching engines 221, each of which operates at an average 

K reorder/rewrite memories 223 are interleaved, so that io switching speed of about 250 kilopackets switched per 

results from the switch engines 221 are read in a round-robin second, and each stage of the packet switch 100 completes 

manner. Thus, output from the reorder/rewrite engine 222 is operation within about 2 microseconds. Although each 

in the original order in which packets 103 arrived at the switching engine 221 is individually only about half as fast 

packet switch 100. as the pipeline processing speed, the accumulated effect 

Thus, each one of the switch engines 221 writes in 15 when using a plurality of switching engines 221 is to add 

sequence to its K designated reorder/rewrite memories 223, their effect, producing an average switching speed for the 

returning to one of its designated reorder/rewrite memories packet switch 100 of about 500 kilopackets switched per 

223 after writing in sequence to its other designated reorder/ second when the pipeline is balanced, 
rewrite memories 223. In parallel, the reorder/rewrite engine In an alternative preferred embodiment, each switching 

222 reads in sequence from all the NK reorder/rewrite 20 engine 221 operates at an average switching speed of about 

memories 223, and returns to one of the NK reorder/rewrite 125 kilopackets switched per second, producing an average 

memories 223 after reading in sequence from all other switching speed for the packet switch 100 of about 250 

reorder/rewrite memories 223. kilopackets switched per second when the pipeline is bal- 

In FIG. 2A, a preferred embodiment in which there are anced. Because the pipeline is limited by its slowest stage, 

two switch engines 221 and four reorder/rewrite memories 25 the overall speed of the packet switch 100 is tunable by 

223, there are four packet headers 104 pipelined in the adjustment of parameters for its architecture, including 

switching stage 220, labeled n+1, n, n-1, and n-2 (now speed of the memory, width of the memory fetch path, size 

available). In FIG. 2B, an alternative preferred embodiment of the cache buffers, and other variables. Such tunability 

in which there are four switch engines 221 and eight allows the packet switch 100 to achieve satisfactory perfor- 

reorder/rewrite memories 223, there are eight packet headers 30 mance at a reduced cost. 

104 pipelined in the switching stage 220, labeled n + 3, n + 2, ENGINE AND FETCH MEMORIES 

n+1, n, n-1, n-2, n-3, and n-4. 

The reorder/rewrite engine 222, in addition to receiving FIG. 3 shows a fetch stage for the packet switch, 
the packet headers 104 in their original order from the ^ The fetch engine 211 comprises a state machine 300 

reorder/rewrite memories 223, may also rewrite MAC head- having signal inputs coupled to the packet memory 110 and 

ers for the packet headers 104 in the packet memory 110, if to the switching stage 220, and haying signal outputs 

such rewrite is called for by the switching protocol. coupled to the switching stage 220. 

The post-processing stage 230 comprises a post- A packet ready signal 301 is coupled to the fetch engine 

processing queue 231 and a post-processor 232. ^ 211 from the packet memory 110 and indicates whether 

The reorder/rewrite engine 222 writes the packet headers there is a packet header 104 ready to be fetched. In this 

104 into a FIFO queue of post-processing memories 231 in description of the fetch engine 211, it is presumed that 

the order it reads them from the reorder/rewrite memories packets 103 arrive quickly enough that the packet ready 

223. Because the queue is a FIFO queue, packet headers 104 signal 301 indicates that there is a packet header 104 ready 

leave the post-processing stage 230 in the same order they 45 to be fetched at substantially all times. If the fetch engine 

enter, which is the original order in which packets 103 211 fetches packet headers 104 quicker than those packet 

arrived at the packet switch 100. headers 104 arrive, at some times the fetch engine 211 (and 

The post-processor 232 performs protocol-specific opera- me downstream elements of the packet switch 100) will 

tions on the packet header 104. For example, the post- have t0 wait f °r more packets 103 to switch, 
processor 232 increments hop counts and recomputes header 50 A switch ready signal 302 is coupled to the fetch engine 

check-sums for IP packet headers 104. The post-processor 211 from each of the switch engines 221 and indicates 

232 then queues the packet 103 for the designated output whether the switch engine 211 is ready to receive a new 

network interface 101, or, if the packet 103 cannot be packet header 104 for switching. 

switched, discards the packet 103 or queues it for processing A data available (or cache ready) signal 303 is coupled to 

by a route server, if one exists. ss each of the switch engines 221 from the fetch engine 211 and 

In FIG. 2 A, a preferred embodiment, and in FIG. 2B, an indicates whether a packet header 104 is present in the fetch 

alternative preferred embodiment, there are two post- cache 212 for switching. 

processing memories 231 in the FIFO queue for the post- A cache empty signal 304 is coupled to the fetch engine 

processing stage 230. In FIG. 2A there are two packet 211 from each of the fetch caches 212 and indicates whether 

headers 104 pipelined in the post-processing stage 230, eo the corresponding switch engine 211 has read all the data 

labeled n-3 and n-2.In FIG. 2B there are two packet headers from the packet header 104 supplied by the fetch engine 211 . 

104 pipelined in the post-processing stage 230, labeled n-6 A data not required signal 307 is coupled to the fetch engine 

and n-5. 211 from each of the switch engines 211 and indicates 

FIG. 2A, a preferred embodiment, and FIG. 2B, an whether the switch engine 211 needs further data loaded into 

alternative preferred embodiment, show that there are sev- 65 the fetch cache 212. 

eral packet headers 104 processed in parallel by the packet It may occur that the switch engine 211 is able to make its 

switch 100. In general, where there are S switching engines switching decision without need for further data from the 
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packet header 104, even though the switch engine 211 has 104 to be offset from (i.e., not aligned with) an eight-byte 

read all the data from the packet header 104 supplied by the block in the packet memory 110. The state machine 300 

fetch engine 211. In this event, the switch engine 211 sets the resets the byte address to zero for successive sets of eight 

data not required signal 307 to inform the fetch engine 211 bytes to be fetched from the packet memory 110. 

that no further data should be supplied, even though the 5 Similarly, an output 315 for the second pair of fetch 

cache empty signal 304 has been triggered. caches 212 is coupled to a byte multiplexer 316. The byte 

It may also occur that the switch engine 211 is able to multiplexer 316 selects one of eight bytes of output data, and 

determine that it can make its switching decision within the is selected by an output of a byte select multiplexer 317. The 

data already available, even if it has not made that switching byte select multiplexer 317 is coupled to a byte address (the 

decision yet. For example, in the IP protocol, it is generally 10 three least significant bits of the read pointer 305) for each 

possible to make the switching decision with reference only of the second pair of fetch caches 212, and is selected by an 

to the first 64 bytes of the packet header 104. If the switch output of the fetch engine 211. The byte multiplexers 316 are 

engine 211 is able to determine that a packet header 104 is coupled to the switching stage 220. 

an IP packet header, it can set the data not required signal M described with regard to FIG. 2, the fetch engine 211 

307- 15 responds to the switch ready signal 302 from a switch engine 

A read pointer 305 is coupled to each of the fetch caches 221 by prefetching the first M bytes of the packet header 104 

212 from the corresponding switch engine 221 and indicates from the packet memory 110 into the corresponding fetch 

a location in the fetch cache 212 where the switch engine cache 212. To perform this task, the fetch engine 211 selects 

221 is about to read a word (of a packet header 104) from the write pointer 306 for the corresponding fetch cache 212 

the fetch cache 212. 20 using the corresponding write address multiplexer 314, 

A write pointer 306 is coupled to each of the fetch caches writes M bytes into the corresponding fetch cache 212, and 

212 from the fetch engine 211 and indicates a location in the updates the write pointer 306. 

fetch cache 212 where the fetch engine 211 is about to write As described with regard to FIG. 2, the fetch cache 212 

a word (of a packet header 104) to the fetch cache 212. raises the cache empty signal 304 when the read pointer 305 

A first pair of fetch caches 212 (labeled "0" and "1") and 25 approaches the write pointer 306, such as when the read 

a second pair of fetch caches 212 (labeled "2" and "3")each pointer 305 is within eight bytes of the write pointer 306. 

comprise dual port random access memory (RAM), prefer- The fetch engine 211 responds to the cache empty signal 304 

ably a pair of 16 word long by 32 bit wide dual port RAM by fetching the next L bytes of the packet header 104 from 

circuits disposed to respond to addresses as a single 16 word 3Q the packet memory 110 into the corresponding fetch cache 

long by 64 bit wide dual port RAM circuit. 212, unless disabled by the data not required signal 307 from 

A 64 bit wide data bus 310 is coupled to a data input for the switch engine 221. To perform this task, the fetch engine 

each of the fetch caches 212. 211 proceeds in like manner as when it prefetched the first 

The read pointers 305 for the first pair of the fetch caches M bytes of the packet header 104. 

212 (labeled as "0" and "1") are coupled to a first read 3S In a preferred embodiment, the fetch cache 212 includes 

address bus 311 for the fetch caches 212 using a first read a "watermark" register (not shown) which records an 

address multiplexer 312. The two read pointers 305 are data address value which indicates, when the read pointer 305 

inputs to the read address multiplexer 312; a select input to reaches that address value, that more data should be fetched, 

the read address multiplexer 312 is coupled to an output of For example, the watermark register may record a value just 

the fetch engine 211. Similarly, the read pointers 305 for the 4Q eight bytes before the write pointer 306, so that more data 

second pair of the fetch caches 212 (labeled as "2" and "3") will only be fetched when the switch engine 221 is actually 

are coupled to a second read address bus 311 for the fetch out of data, or the watermark register may record a value 

caches 212 using a second read address multiplexer 312, and more bytes before the write pointer 306, so that more data 

selected by an output of the fetch engine 211. will be fetched ahead of actual need. Too-early values may 

Similarly, the write pointers 306 for the first pair of the 45 result in data being fetched ahead of time without need, 

fetch caches 212 (labeled as "0" and "1") are coupled to a while too-late values may result in the switch engine 221 

first write address bus 313 for the fetch caches 212 using a having to wait. Accordingly, the value recorded in the 

first write address multiplexer 314. The two write pointers watermark register can be adjusted to better match the rate 

306 are data inputs to the write address multiplexer 314; a at which data is fetched to the rate at which data is used by 

select input to the write address multiplexer 314 is coupled 50 the switch engine 221. 

to an output of the fetch engine 211. Similarly, the write While the switch engine 221 reads from the fetch cache 

pointers 306 for the second pair of the fetch caches 212 212, the fetch engine 211 prefetches the first M bytes of 

(labeled as "2" and "3")are coupled to a second write another packet header 104 from the packet memory 110 into 

address bus 313 for the fetch caches 212 using a second another fetch cache 212 (which may eventually comprise the 

write address multiplexer 314, and selected by an output of 55 other fetch cache 212 of the pair). To perform this task, the 

the fetch engine 211. fetch engine 211 selects the write pointer 306 for the 

An output 315 for the first pair of fetch caches 212 is recipient fetch cache 212 using the corresponding write 

coupled to a byte multiplexer 316. The byte multiplexer 316 address multiplexer 314, writes M bytes into the recipient 

selects one of eight bytes of output data, and is selected by fetch cache 212, and updates the corresponding write pointer 

an output of a byte select multiplexer 317. The byte select eo 306. 

multiplexer 317 is coupled to a byte address (the three least The switch engines 221 are each coupled to the read 

significant bits of the read pointer 305) for each of the first pointer 305 for their corresponding fetch cache 212. Each 

pair of fetch caches 212, and is selected by an output of the switch engine 221 independently and asychronously reads 

fetch engine 211. from its corresponding fetch cache 212 and processes the 

An initial value for the byte address (the three least 65 packet header 104 therein. To perform this task, the switch 

significant bits of the read pointer 305) may be set by the engine 221 reads one byte at a time from the output of the 

state machine 300 to allow a first byte of the packet header output multiplexer 320 and updates the corresponding byte 
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address (the three least significant bits of the read pointer 
305). When the read pointer 305 approaches the write 
pointer 306, the cache low signal 304 is raised and the fetch 
engine 211 fetches L additional bytes "on demand". 

MULTIPLE PACKET SWITCHES IN PARALLEL 5 

FIG. 4 shows a block diagram of a system having a 
plurality of packet switches in parallel. 

In a parallel system 400, the packet memory 110 is 
coupled in parallel to a plurality of (preferably two) packet 10 
switches 100, each constructed substantially as described 
with regard to FIG. 1. Each packet switch 100 takes its input 
from the packet memory 110. However, the output of each 
packet switch 100 is directed instead to a reorder stage 410, 
and an output of the reorder stage 410 is directed to the 15 
packet memory 110 for output to a network interface 101. 

The output of each packet switch 100 is coupled in 
parallel to the reorder stage 410. The reorder stage 410 
comprises a plurality of reorder memories 411, preferably 2Q 
two per packet switch 100 for a total of four reorder 
memories 411. The reorder stage 410 operates similarly to 
the reorder/rewrite memories 222 of the packet switch 100; 
the packet switches 100 write their results to the reorder 
memories 411, whereinafter a reorder processor 412 reads ^ 
their results from the reorder memories 411 and writes them 
in the original arrival order of the packets 103 to the packet 
memory 110 for output to a network interface 101. 

In a preferred embodiment where each packet switch 100 
operates quickly enough to achieve an average switching 30 
speed of about 500 kilopackets per second and the reorder 
stage 410 operates quickly enough so that the pipeline is still 
balanced, the parallel system 400 produces a throughput of 
about 1,000 kilopackets switched per second. 

Alternative embodiments of the parallel system 400 may 35 
comprise larger numbers of packet switches 100 and 
reorder/rewrite memories 411. For example, in one alterna- 
tive embodiment, there are four packet switches 100 and 
eight reorder/rewrite memories 411, and the reorder stage 
410 is greatly speeded up. In this alternative embodiment, 40 
the parallel system 400 produces a throughput of about 
2,000 kilopackets switched per second. 

Alternative Embodiments 

Although preferred embodiments are disclosed herein, 
many variations are possible which remain within the 45 
concept, scope, and spirit of the invention, and these varia- 
tions would become clear to those skilled in the art after 
perusal of this application. 

We claim: 

1. A packet switch comprising 50 

a fetch engine coupled to a source of packet headers; 

a plurality of fetch caches coupled to said fetch engine, 
and disposed to store at least portions of packet headers 
received therefrom; ss 

a plurality of switch engines, each coupled to a corre- 
sponding one of said fetch caches, and disposed to read 
said portions of packet headers therefrom; 

a plurality of reorder/rewrite buffers, each said reorder/ 
rewrite buffer coupled to one of said switch engines, go 
and disposed to store pointers to packet headers 
received from at least one of said plurality of switch 
engines; 

a reorder/rewrite engine coupled to said plurality of 
reorder/rewrite buffers, and disposed to read pointers to 65 
packet headers therefrom in an order said packet head- 
ers were originally received; 
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a post-process queue coupled to said reorder/rewrite 
engine, and disposed to store pointers to packet headers 
received therefrom; and 

a post-process engine coupled to said post -process queue, 
and disposed to process said packet headers. 

2. A packet switch comprising: 
a switching stage; 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to said switching stage; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post process 
stage; and 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers. 

3. A packet switch as in claim 2, wherein said switching 
stage comprises a plurality of switch engines, each being 
disposed to receive a packet header and to produce a set of 
results for switching said packet header such that the input 
order of said packet header is preserved. 

4. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage, wherein said 
fetch stage comprises a fetch engine, said fetch engine 
being disposed to fetch a first block of M bytes of a 
packet header in response to a first signal, and being 
disposed to fetch an additional block of L bytes of said 
packet header in response to a second signal; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post-process 
stage; and 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers. 

5. A packet switch as in claim 4, wherein said first signal 
indicates an empty fetch cache and wherein said second 
signal indicates a fetch cache with fewer than a selected 
number of unread bytes. 

6. A packet switch as in claim 4, wherein M is indepen- 
dent of said packet header. 

7. A packet switch as in claim 4, wherein M is adjusted 
according to said packet header. 

8. A packet switch as in claim 4, wherein L is equal to a 
byte width of an interface to said source of packet headers. 

9. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage, wherein said 
fetch stage comprises a plurality of fetch caches, each 
one of said fetch caches being coupled to said source of 
packet headers and each being disposed to store at least 
a portion of a packet header; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post-process 
stage; and 
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said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers. 

10. A packet switch as in claim 9, wherein said switching 
stage comprises a plurality of switch engines, each being 5 
coupled to one said fetch cache and each being disposed to 
receive said portion of a packet header and to produce a set 
of results for switching said packet header. 

11. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 10 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage, wherein said 
fetch stage comprises a fetch engine coupled to said 
source of packet headers; 15 
a plurality of fetch caches coupled to said fetch engine, 

each said fetch cache comprising a plurality of 

buffers; 

wherein said fetch engine is disposed to write at least 
a portion of each said packet header in sequence to 20 
each said fetch cache in a selected buffer thereof; 

wherein said switching stage comprises a switch engine 
for each said fetch cache, wherein each said switch 
engine is disposed to read at least a portion of said 
packet header in sequence from each said buffer of 25 
said fetch cache; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel- and present said 
packet headers in their original order to a post- 30 
process stage; and 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers. 

12. A packet switch as in claim 11, wherein each said fetch 35 
cache is selected for storage in a round-robin manner, 

13. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 40 
headers in parallel to a switching stage, wherein said 
switching stage comprises a plurality of reorder/rewrite 
memories, each one of said reorder/rewrite memories 
being disposed to store a pointer to a packet header; 

said switching stage coupled to said fetch stage, said 45 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post -process 
stage; and 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers. 

14. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 5S 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 60 
headers asynchronously in parallel and present said 
packet headers in their original order to a post -process 
stage; 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- gs 
cessing on said packet headers; and 

wherein said switching stage comprises 
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a plurality of switch engines, each being disposed to 
receive a packet header and to produce a set of 
results for switching said packet header; 

a plurality of reorder/rewrite memories, each one of 
said reorder/ rewrite memories being disposed to 
store a packet header; and 

a reorder/ rewrite processor coupled to said plurality of 
reorder/ rewrite memories and disposed to receive 
said packet headers from said reorder/ rewrite 
memories in an order in which said packet headers 
were originally received. 

15. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post-process 
stage; 

said post -pro cess stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers; and 
wherein said switching stage comprises 

a plurality of switch engines, each being disposed to 

receive a packet header and to produce a set of 

results for switching said packet header; and 
a plurality of reorder/rewrite memories, each one of 

said reorder/rewrite memories being disposed to 

store a packet header; 
said reorder/rewrite memories being divided into sets, 

each said set of reorder/rewrite memories being 

assigned to arid receiving outputs from exactiy one 

said switch engine. 

16. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post-process 
stage; 

said post -process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers; and 
wherein said switching stage comprises 

a plurality of switching engines, each said switching 
engine having a plurality of reorder/rewrite memo- 
ries coupled thereto, each said switching engine 
being disposed to write in sequence to one of said 
plurality of reorder/rewrite memories; and 
a reorder/rewrite engine coupled to all said reorder/ 
rewrite memories, said reorder/rewrite engine being 
disposed to read in sequence from said reorder/ 
rewrite memories. 

17. A packet switch as in claim 16, wherein said reorder/ 
rewrite engine is disposed to alter at least portions of packet 
headers referenced by said reorder/rewrite memories. 

18. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 
headers in parallel to a switching stage; 
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said switching stage coupled to said fetch stage, said 
switching stare being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post-process 
stage; 5 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers, wherein said post- 
process stage comprises a plurality of post-processing 
memories, each one of said post -processing memories 10 
being disposed to store a pointer to a packet header. 

19. A packet switch, comprising 

a fetch stage coupled to a source of packet headers, said 
fetch stage being disposed to fetch at least portions of 
packet headers and present said portions of packet 15 
headers in parallel to a switching stage; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post-process 
stage; 

said post-process stage coupled to said switching stage, 
and being disposed to perform protocol-specific pro- 
cessing on said packet headers, wherein said post- ^ 
process stage comprises a post-processor coupled to 
said switching stage and disposed to alter at least a 
portion of a packet header responsive to a switching 
protocol. 

20. A system, comprising ^ 
a packet memory; 

a plurality of packet switches coupled to said packet 
memory; 

a plurality of reorder memories coupled to said plurality 
of packet switches; and 35 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; 

wherein, each packet switch comprises a fetch stage 40 
coupled to said packet memory, said fetch state being 
disposed to fetch packet headers from said packet 
memory and present at least portions of packet headers 
in parallel to a switching stage; and said switching 
stage coupled to said fetch stage, said switching stage 45 
being disposed to switch said packet headers asynchro- 
nously in parallel and present said packet headers in 
their original order to a post-process stage. 

21. A system as in claim 20, wherein said switching stage 
comprises a plurality of switch engines, each being disposed 50 
to receive a packet header and to produce a set of results 
such that the in-out order of said packet header is preserved. 

22. A system, comprising 

a packet memory; 5S 
a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 60 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories, 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory, said fetch 65 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 



996 

14 

packet headers in parallel to a switching stage, 
wherein said fetch stage comprises a fetch engine, 
said fetch engine being disposed to fetch a first block 
of M bytes of a packet header in response to a first 
signal, and being disposed to fetch an additional 
block of L bytes of said packet header in response to 
a second signal; and 
said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post- 
process stage. 

23. A system as in claim 22, wherein said first signal 
indicates an empty fetch cache and wherein said second 
signal indicates a fetch cache with fewer than a selected 
number of unread bytes. 

24. A system, comprising 
a packet memory; 

a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories, 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory said fetch 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 
packet headers in parallel to a switching stage, 
wherein said fetch stage comprises a plurality of 
fetch caches, each one of said fetch caches being 
coupled to said source of packet headers and each 
being disposed to store at least a portion of a packet 
header; and 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post- 
process stage. 

25. A system as in claim 24, wherein said switching stage 
comprises a plurality of switch engines, each being coupled 
to one said fetch cache and each being disposed to receive 
a packet header and to produce a set of results for switching 
said packet header. 

26. A system, comprising 
a packet memory; 

a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories, 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory said fetch 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 
packet headers in parallel to a switching stage, 
wherein said fetch stage comprises 

a fetch engine coupled to said source of packet headers; 
and 

a plurality of fetch caches coupled to said fetch engine, 
each said fetch cache comprising a plurality of 
buffers; 
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wherein said fetch engine is disposed to write at least 
a portion of each said packet header in sequence 
from each said fetch cache in a selected buffer 
thereof; 

wherein said switching stage comprises a switch engine 5 
for each said fetch cache, wherein each said switch 
engine is disposed to read at least a portion of said 
packet header in sequence from each said buffer of 
said fetch cache; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post- 
process stage. 

27. A system, comprising 

a packet memory; 15 
a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 20 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories, 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory said fetch 25 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 
packet headers in parallel to a switching stage; 
wherein said switching stage comprises 

a plurality of switch engines, each being disposed to 30 
receive a packet header and to produce a set of 
results for switching said packet header; 
a plurality of reorder/rewrite memories, each one of 
said reorder/rewrite memories being disposed to 
store a packet header; and 3S 
a reorder/rewrite processor coupled to said plurality 
of reorder/rewrite memories and disposed to 
receive said packet headers from said reorder/ 
rewrite memories in an order in which said packet 
headers were originally received; ^ 
said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said 
packet headers asynchronously in parallel and 
present said packet headers in their original order 
to a post -process stage. 45 

28. A system as in claim 27, wherein said reorder/rewrite 
processor is disposed to alter at least portions of said packet 
headers referenced by said reorder/rewrite memories. 

29. A system, comprising 

a packet memory; 50 
a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 55 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories, 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory, said fetch 60 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 
packet headers in parallel to a switching stage; 
wherein said switching stage comprises 

a plurality of switch engines, each being disposed to 65 
receive a packet header and to produce a set of 
results for switching said packet header; and 
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a plurality of reorder/rewrite memories, each one of 
said reorder/rewrite memories being disposed to 
store a packet header; 

said reorder/rewrite memories being divided into 
sets, each said set of reorder/rewrite memories 
being assigned to and receiving outputs from 
exactly one said switch engine; and 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said 
packet headers asynchronously in parallel and 
present said packet headers in their original order 
to a post-process stage. 

30. A system, comprising 
a packet memory; 

a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory said fetch 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 
packet headers in parallel to a switching stage; 
wherein said switching stage comprises 

a plurality of switching engines, each said switching 
engine having a plurality of reorder/rewrite 
memories coupled thereto, each said switching 
engine being disposed to write in sequence to one 
of said plurality of reorder/rewrite memories; and 
a reorder/rewrite engine coupled to all said reorder/ 
rewrite memories, said reorder/rewrite engine 
being disposed to read in sequence from said 
reorder/rewrite memories; 
said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said 
packet headers asynchronously in parallel and 
present said packet headers in their original order 
to a post-process stage. 

31. A system, comprising 
a packet memory; 

a plurality of reorder memories; 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received; and 

a plurality of packet switches coupled to said packet 
memory and said plurality of reorder memories, 
wherein each one of said plurality of packet switches 
comprises 

a fetch stage coupled to said packet memory, said fetch 
stage being disposed to fetch packet headers from 
said packet memory and present at least portions of 
packet headers in parallel to a switching stage; 

said switching stage coupled to said fetch stage, said 
switching stage being disposed to switch said packet 
headers asynchronously in parallel and present said 
packet headers in their original order to a post- 
process stage, wherein said switching stage com- 
prises a plurality of reorder/rewrite memories, each 
one of said reorder/rewrite memories being disposed 
to store a pointer to a packet header. 
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32. A method of switching packets, said method compris- 
ing 

fetching a sequence of packet headers corresponding to 

said packets from a source of said packet headers; 
presenting said packet headers in parallel to a plurality of 5 

switch engines; 
operating said switch engines to switch said packets 

asynchronously in parallel; 
presenting switched packet headers in their original order 10 

to a post-processor; and 

operating said post-processor to perform protocol-specific 
processing on said packet headers. 

33. A method of switching packets, said method compris- 
ing 15 

fetching a sequence of packet headers corresponding to 
said packets from a source of said packet headers, 
wherein said step of fetching includes fetching a first 
block of M bytes of a packet header in response to a 
first signal and fetching an additional block of L bytes 20 
of said packet header in response to a second signal; 

presenting said packet headers in parallel to a plurality of 
switch engines; 

operating said switch engines to switch said packets 
asynchronously in parallel; 

presenting switched packet headers in their original order 
to a post-processor; and 

operating said post-processor to perform protocol-specific 
processing on said packet headers. 30 

34. A method as in claim 33, wherein said first signal 
indicates an empty fetch cache and wherein said second 
signal indicates a fetch cache with fewer than a selected 
number of unread bytes. 

35. A method of switching packets, said method compris- 35 
ing 

fetching a sequence of packet headers corresponding to 
said packets from a source of said packet headers, 
wherein said step of fetching includes storing said 
packet headers in sequence into a plurality of fetch 40 
caches; 

presenting said packet headers in parallel to a plurality of 

switch engines; 
operating said switch engines to switch said packets 4$ 

asynchronously in parallel; 
presenting switched packet headers in their original order 

to a post-processor; and 
operating said post-processor to perform protocol-specific 

processing on said packet headers. 50 

36. A method of switching packets, said method compris- 
ing 

fetching a sequence of packet headers corresponding to 
said packets from a source of said packet headers; 

presenting said packet headers in parallel to a plurality of 55 
switch engines, 



operating said switch engines to switch said packets 

asynchronously in parallel; 
presenting switched packet headers in their original order 

to a post-processor; and 
operating said post-processor to perform protocol-specific 

processing on said packet headers wherein said step of 

operating said post-processor stage comprises altering 

at least a portion of a packet header. 

37. A method of switching packets, said method compris- 
ing 

fetching a sequence of packet headers corresponding to 
said packets from a source of said packet headers; 

presenting said packet headers in parallel to a plurality of 
switch engines; 

operating said switch engines to switch said packets 
asynchronously in parallel, wherein said step of oper- 
ating said switch engines includes coupling each said 
packet header to a selected fetch cache, coupling each 
said fetch cache to a selected switch engine, and 
coupling a set of results from said selected switch 
engine to a reorder/rewrite memory; 

presenting switched packet headers in their original order 
to a post-processor; and 

operating said post-processor to perform protocol-specific 
processing on said packet headers. 

38. A method as in claim 37, wherein said reorder/rewrite 
memories are divided into sets, each said set of reorder/ 
rewrite memories being assigned to and receiving outputs 
from exactly one said switch engine. 

39. A system, including; 
a packet memory; 

a plurality of packet switches coupled to said packet 
memory, wherein each packet switch includes a fetch 
stage and a switching stage; 

said fetch stage being coupled to said packet memory and 
disposed to fetch packet headers from said packet 
memory and present at least portions of said packet 
headers in parallel to said switching stage; 

said switching stage being coupled to said fetch stage and 
disposed to switch said packet headers asynchronously 
in parallel and present said packet headers in their 
original order; 

a plurality of reorder memories coupled to said plurality 
of packet switches; and 

a reorder engine coupled to said plurality of reorder 
memories and disposed to receive packet headers from 
said reorder memories in an order in which they were 
originally received. 

40. A system as in claim 39, wherein said switching stage 
comprises of a plurality of switch engines, each being 
disposed to receive a packet header and to produce a set of 
results such that the input order of said packet header is 
mimicked. 
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